0%

(CVPR 2017) Joint detection and identification feature learning for person search

Xiao T, Li S, Wang B, et al. Joint detection and identification feature learning for person search[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3376-3385.



1. Overview


1.1. Motivation

  • existing methods mainly focus on matching cropped pedestrian images between queries and candidates (assum perfect detection)


In this paper, it proposed a framework for person search

  • jointly handle pedestrian detection and person re-identification
    • proposal net. focus more on the recall rather than the precision
    • misalignments of proposal can be furtheradjusted by the identification net


  • Online Instance Matching (OIM) loss function
  • collect and annotate a large-scale benchmark dataset

1.2. Comparison of Loss Function

  • pairwise or triplet loss. O(N^2) need efficient strategy, difficult to find
  • softmax. compare all samples at the same time, as the number of classes increases, training the big softmax classifier matrix become much slower or even can not converge
  • OIM.
    • compare samples of mini-batch with all registered entries
    • unlabeled identities can be served as negatives for labeled identities

1.3. Contribution

  • jointly optimization
  • OIM loss function
  • dataset

1.4.1. Person Re-identification

  • manually design discriminative features
  • learn feature transforms across camera views
  • learning distance metrics
  • CNN
  • triplet samples
  • classify
  • on abnormal images. low-resolution and partially occlude images

1.4.2. Pedestrian Detection

  • hand-crafted features. DPM, ACF and Checkerboard

1.5. Dataset

  • CUHK03
  • Market501
  • Duke



2. Method




2.1. Structure

  • output.
    • 2048 dimension→ L2-normalized 256 dimension→ cosine similarities
      • 2048 dimension to proposal alignment

2.2. Online Instance Matching Loss



  • only consider the labeled abd unlabeled identities while leave the other proposals untouched
  • the look up table (LUT). L: the size of the table; D: vector dimension



  • forward. compute cosine similarities between the mini-batch sample and all the labeled identities.
    x. the features of a labeled identity inside a mini-batch



  • backward. if targe class id is t, update t-th column of the LUT, and then scale to unit L2-norm



  • many unlabeled identities can be safely used as negative classes for all the labeled identities, and store in circular queue. Q: size of the queue



  • cosine similarities



The probability of x being recognized as the identity with class-id i



  • L. the number of different target people
  • Q. the size of circular queue to store unlabeled prople
  • τ. higher temperature leads to softer probability distribution

The probability of x being recognized as the i-th unlabeled identity



  • Maximization



  • Degradation



    )

2.3. Drawback of Softmax

  • classifier matrix suffers from large variance of gradients and can not by learned effectively
    • large number of identities, which only has several instances; each image contains a few identities
    • learn more than 5,000 discriminant functions simultaneously, but during each SGD iteration we only have positive samples from tens of classes
  • can not exploit the unlabeled identities with softmax loss
  • OIM is non-parametric.
    • potential drawback. overfit more easily, it find that projecting features into a L2-normalized low-dimensional subspace helps reduce overfitting

2.4. Scalability

  • when the number of identities increases, OIM could be time-consuming
  • approximate by sub-sampling the labeled and unlabeled identities



3. Dataset


3.1. Come From

  • hand-held camera to shoot
  • movie snapshots


3.2. Processing

  • ignore smaller heights than 50 pixels


3.3. Evaluation

  • no overlapped images or labeled identities between training and test set

3.4. Metrics

  • cumulative matching characteristics (CMC top-K)
  • mean averaged precision (mAP)



4. Experiments


4.1. Details

  • τ. 0.1
  • size of circular queue. 5,000
  • mini-batch. 2 images
  • learning rate. 0.001 to 0.0001 after 40k

4.2. Detection





4.4. OIM



  • converge faster
  • consistently improves the test performance


4.5. Sub-sample of OIM



  • small number converge faster

4.6. Low-dimensional Subspace



  • project features into a proper low-rank subspace is very important to regularize the network training

4.7. Detection Recall



  • higher recall does not necessarily lead to higher person search performance, re-id method could still get confused on some false alarms
  • should not only focus on training re-id methods with manually cropped pedestrians, but should consider the detections jointly under the person search problem setting


  • larger gallery, more difficult
  • all methods may suffer from some common hard samples